Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up NullObservations from the stream #6260

Merged
merged 9 commits into from
Feb 19, 2025
Merged

Clean up NullObservations from the stream #6260

merged 9 commits into from
Feb 19, 2025

Conversation

enyst
Copy link
Collaborator

@enyst enyst commented Jan 14, 2025

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions


Link of any specific issues this addresses


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:66acf0c-nikolaik   --name openhands-app-66acf0c   docker.all-hands.dev/all-hands-ai/openhands:66acf0c

@enyst enyst marked this pull request as draft January 14, 2025 07:25
@enyst enyst marked this pull request as ready for review February 2, 2025 00:46
@enyst enyst requested a review from rbren February 4, 2025 15:44

This comment was marked as outdated.

This comment was marked as outdated.

Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

Copy link
Contributor

Trigger by: Pull Request (integration-test label on PR #6260)
Commit: 27ef869
Integration Tests Report (Haiku)
Haiku LLM Test Results:
Success rate: 100.00% (7/7)

Total cost: USD 0.10

instance_id success reason cost error_message
t07_interactive_commands True 0.012 nan
t03_jupyter_write_file True 0.014 nan
t01_fix_simple_typo True 0.015 nan
t02_add_bash_hello True 0.007 nan
t06_github_pr_browsing True 0.031 nan
t05_simple_browsing True 0.007 nan
t04_git_staging True 0.013 nan

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
Success rate: 42.86% (3/7)

Total cost: USD 0.01

instance_id success reason cost error_message
t01_fix_simple_typo False File not fixed: This is a stupid typoo. 0.001 RuntimeError: There was an unexpected error while running the agent. Please report this error to the developers. Your session ID is default. Error type: APIConnectionError
Really?
No mor typos!
Enjoy!
t07_interactive_commands False The answer is not found in any message. Total messages: 1. 0.001 RuntimeError: There was an unexpected error while running the agent. Please report this error to the developers. Your session ID is default. Error type: APIConnectionError
t03_jupyter_write_file True 0.001 RuntimeError: There was an unexpected error while running the agent. Please report this error to the developers. Your session ID is default. Error type: APIConnectionError
t05_simple_browsing False The answer is not found in any message. Total messages: 1. 0.001 RuntimeError: There was an unexpected error while running the agent. Please report this error to the developers. Your session ID is default. Error type: APIConnectionError
t04_git_staging True 0.002
t06_github_pr_browsing False The answer is not found in any message. Total messages: 1. 0.001 RuntimeError: There was an unexpected error while running the agent. Please report this error to the developers. Your session ID is default. Error type: APIConnectionError
t02_add_bash_hello True 0.002

Integration Tests Report Delegator (Haiku)
Success rate: 50.00% (1/2)

Total cost: USD 0.06

instance_id success reason cost error_message
t02_add_bash_hello True 0.019 nan
t01_fix_simple_typo False File not fixed: This is a corrected text. 0.039 nan
Really?
No more typos!
Enjoy!

Integration Tests Report Delegator (DeepSeek)
Success rate: 100.00% (2/2)

Total cost: USD 0.00

instance_id success reason cost error_message
t01_fix_simple_typo True 0.001 nan
t02_add_bash_hello True 0.002 nan

Integration Tests Report VisualBrowsing (DeepSeek)
Success rate: 100.00% (1/1)

Total cost: USD 0.00

instance_id success reason cost error_message
t05_simple_browsing True 0.001 nan

Download testing outputs (includes both Haiku and DeepSeek results): Download

@enyst enyst merged commit 663e361 into main Feb 19, 2025
19 checks passed
@enyst enyst deleted the enyst/null-obs branch February 19, 2025 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants